Runtime home mapping for effective memory resource usage

نویسندگان

Mario Lodde

Jose Flich

چکیده

In tiled Chip Multiprocessors (CMPs) last-level cache (LLC) banks are usually shared but distributed among the tiles. A static mapping of cache blocks to the LLC banks leads to poor efficiency since a block may be mapped away from the tiles actually accessing it. Dynamic policies either rely on the static mapping of blocks to a set of banks (D-NUCA) or rely on the OS to dynamically load pages to statically mapped addresses (first-touch). In this paper, we propose Runtime Home Mapping (RHM), a new dynamic approach where the LLC home bank is determined at runtime by the memory controller when the block is fetched from main memory, trying to map each block as close as possible to the requestor thus speeding up execution time and lowering message latencies. Block migration and replication provide further improvements to basic RHM. Also, in a further optimization we eliminate the directory structure. All these optimizations involve specific NoC optimizations and co-designs. Results with PARSEC and SPLASH2 applications show, when compared with alternative solutions, that RHM achieves a 41% and 35% average reduction in load and store latencies respectively compared to static mapping. This leads to an average reduction of 28% in applications execution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Garbage Collector Memory Accounting in Language-Based Systems

Language run-time systems are often called upon to safely execute mutually distrustful tasks within the same runtime, protecting them from other tasks’ bugs or otherwise hostile behavior. Well-studied access controls exist in systems such as Java to prevent unauthorized reading or writing of data, but techniques to measure and control resource usage are less prevalent. In particular, most langu...

متن کامل

Low Latency and Memory Efficient Viterbi Decoder Using Modified State-Mapping Method

In this paper, a new implementation of the Viterbi decoder is proposed. The Modified State-Mapping VD algorithm combines the TB algorithm with the RE algorithm. By updating the starting point of the state for each memory bank, and by using Trace Back and Trace Forward information, LIFO (Last Input First Output) operation can be eliminated, which reduces the latency of the TB algorithm and decre...

متن کامل

Cutless FPGA Mapping

The paper presents a new algorithm for FPGA technology mapping into K-input LUTs. The algorithm avoids cut enumeration by incrementally computing and updating one good K-feasible cut at each node of the subject graph. The main advantage of the algorithm is that it works for very large LUT size while offering dramatic improvements in memory and runtime. For 10-input LUTs, the memory is reduced 2...

متن کامل

Workload Characteristics of a Multi-cluster Supercomputer

This paper presents a comprehensive characterization of a multi-cluster supercomputer workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parallelism), job run time, memory usage, and user/group behavior. Correlations between metrics (job runtime and me...

متن کامل

SPMPool: Runtime SPM Management for Embedded Many-Cores

Distributed scratchpad memories (SPM) in embedded many-core systems require careful selection of data placement such that good performance can be achieved. In this paper, we propose SPMPool to share the available on-chip scratchpads on many-cores among executing applications in order to reduce the overall memory access latency. By pooling SPM resources, we can assign underutilized memory resour...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Microprocessors and Microsystems - Embedded Hardware Design

دوره 38 شماره

صفحات -

تاریخ انتشار 2014

Runtime home mapping for effective memory resource usage

نویسندگان

چکیده

منابع مشابه

Garbage Collector Memory Accounting in Language-Based Systems

Low Latency and Memory Efficient Viterbi Decoder Using Modified State-Mapping Method

Cutless FPGA Mapping

Workload Characteristics of a Multi-cluster Supercomputer

SPMPool: Runtime SPM Management for Embedded Many-Cores

عنوان ژورنال:

اشتراک گذاری